Missing Value Estimation in DNA Microarrays Using B-Splines
نویسندگان
چکیده
Gene expression profiles generated by the highthroughput microarray experiments are usually in the form of large matrices with high dimensionality. Unfortunately, microarray experiments can generate data sets with multiple missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. Numerous imputation algorithms have been proposed to estimate the missing values. However, most of these algorithms fail to take into account the fact that gene expressions are continuous time series, and deal with gene expression profiles in terms of discrete data. In this paper, we present a new approach, FDVSplineImpute, for time series gene expression analysis that permits the estimation of missing observations using B-splines of similar genes from fuzzy difference vectors. We have used smoothing splines to relax the fit of the splines so that they are less prone to over fitting the data. Our algorithm shows significant improvement over the current state-of-the-art
منابع مشابه
Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing
Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate esti...
متن کاملMissing value estimation methods for DNA microarrays
MOTIVATION Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values....
متن کاملHeuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery
Microarrays are able to measure the patterns of expression of thousands of genes in a genome to give profiles that facilitate much faster analysis of biological processes for diagnosis, prognosis and tailored drug discovery. Microarrays, however, commonly have missing values which can result in erroneous downstream analysis. To impute these missing values, various algorithms have been proposed ...
متن کاملRobust SVD Method for Missing Value Estimation of DNA Microarrays
A majority of DNA microarray datasets contain missing or corrupt values and it is critical to estimate these values accurately. These missing values are most often attributed to insufficient experimental resolution or the presence of foreign objects on the experimental slide’s surface. To improve existing missing value estimation algorithms, this paper introduces and investigates the scalable s...
متن کامل